112 research outputs found
Reproducible research through persistently linked and visualized data
The demand of reproducible results in the numerical simulation of opto-electronic devices or more general in mathematical modeling and simulation requires the (long-term) accessibility of data and software that were used to generate those results. Moreover, to present those results in a comprehensible manner data visualizations such as videos are useful. Persistent identifier can be used to ensure the permanent connection of these different digital objects thereby preserving all information in the right context. Here we give an overview over the state-of-the art of data preservation, data and software citation and illustrate the benefits and opportunities of enhancing publications with visual simulation data by showing a use case from opto-electronics
Reproducible research through persistently linked and visualized data
The demand of reproducible results in the numerical simulation of
opto-electronic devices or more general in mathematical modeling and
simulation requires the (long-term) accessibility of data and software that
were used to generate those results. Moreover, to present those results in a
comprehensible manner data visualizations such as videos are useful.
Persistent identifier can be used to ensure the permanent connection of these
different digital objects thereby preserving all information in the right
context. Here we give an overview over the state-of-the art of data
preservation, data and software citation and illustrate the benefits and
opportunities of enhancing publications with visual simulation data by
showing a use case from opto-electronics
Towards Semantic Integration of Federated Research Data
Digitization of the research (data) lifecycle has created a galaxy of data nodes that are often characterized by sparse interoperability. With the start of the European Open Science Cloud in November 2018 and facing the upcoming call for the creation of the National Research Data Infrastructure (NFDI), researchers and infrastructure providers will need to harmonize their data efforts. In this article, we propose a recently initiated proof-of-concept towards a network of semantically harmonized Research Data Management (RDM) systems. This includes a network of research data management and publication systems with semantic integration at three levels, namely, data, metadata, and schema. As such, an ecosystem for agile, evolutionary ontology development, and the community-driven definition of quality criteria and classification schemes for scientific domains will be created. In contrast to the classical data repository approach, this process will allow for cross-repository as well as cross-domain data discovery, integration, and collaboration and will lead to open and interoperable data portals throughout the scientific domains. At the joint lab of L3S research center and TIB Leibniz Information Center for Science and Technology in Hanover, we are developing a solution based on a customized distribution of CKAN called the Leibniz Data Manager (LDM). LDM utilizes the CKANâs harvesting functionality to exchange metadata using the DCAT vocabulary. By adding the concept of semantic schema to LDM, it will contribute to realizing the FAIR paradigm. Variables, their attributes and relationships of a dataset will improve findability and accessibility and can be processed by humans or machines across scientific domains. We argue that it is crucial for the RDM development in Germany that domain-specific data silos should be the exception, and that a semantically-linked network of generic and domain-specific research data systems and services at national, regional, and organization levels should be promoted within the NFDI initiative
Recommended from our members
OSGeo conference videos as a resource for scientific research: The TIB|AV Portal
This paper reports on new opportunities for research and education in Free and Open Source Geoinformatics as a translational part of Open Science, enabled by a growing collection of OSGeo conference video recordings at the German National Library of Science and Technology (TIB). Since 2015, OSGeo conference recordings have been included to the collection sphere of TIB in information sciences. Currently, video content from selected national (FOSSGIS), regional (FOSS4G-NA) and global (FOSS4G) conferences is being actively collected. The annual growth exceeds 100 hours of new content relating to the OSGeo software projects and the OSGeo scientific-technical communities. This is seconded by retrospective acquisition of video material dating from past conferences, going back until 2002 to preserve this content, ensuring both long term availability and access. The audiovisual OSGeo-related content is provided through the TIB|AV Portal, a web-based platform for scientific audiovisual media providing state-of-the art multimedia analysis and retrieval. It implements the requirements by research libraries for reliable long term preservation. Metadata enhancement analysis provides extended search and retrieval options. Digital Object Identifiers (DOI) enable scientific citation of full videos, excerpts and still frames, use in education and also referral in social networks. This library-operated service infrastructure turns the audiovisual OSGeo-related content in a reliable source for science and education
Leibniz Data Manager â An adaptive Research Data Management System
The increasing demand of researchers to make the underlying research data openly accessible, in addition to the classic publication forms, can improve the reproducibility of scientific findings, whether voluntarily or due to the institutionâs or research fundersâ requirements. As a result, researchers depend on expressive descriptions of research data for reusability. These descriptions are in the form of comprehensive metadata stored in heterogeneous formats in research data repositories. However, finding the appropriate data is arduous, as there is a growing amount of research data stored in various places and only a few repositories offer the function of displaying a preview of the data. Research work efficiency can benefit from data previews whenever researchers can explore portions of a dataset before deciding on the relevance of the data for accessing and downloading the whole dataset. The Leibniz Data Manager (LDM) is a research data management system that resorts to Semantic Web technologies to empower FAIR principles. LDM supports searching and exploring research data across various repositories. LDM provides an additional (meta-)data management layer for data collected from existing research data repositories based on the webbased data catalog software CKAN (Comprehensive Knowledge Archive Network). The primary purpose of LDM is to preview research data, e.g., tables, audio-visual material like AutoCAD files or 2D and 3D data, or live programming code via Jupyter Notebook(s) so that their potential for reuse can be easily evaluated. Since LDM is available as a Docker container, anyone can install a local LDM distribution to assist research data management in different phases of the data lifecycle. LDM is accessible at https://service.tib.eu/ldmservice/. LDM empowers researchers by supporting them in preserving their research data as open and FAIR as possible. With LDM, researchers can check whether their data is displayed correctly and whether it is available in suitable and preferably open data formats before publication. In addition, humans and computational programs can access machine-readable metadata, which can be exported in various schemas (DCAT, DataCite, and DublinCore) and RDF serializations. This enables automated searching and processing by various data bases and tools. More importantly, DataCite DOIs and ORCIDs ensure the persistence and findability of LDM (meta-)data. At the poster session we will demonstrate how scientists can be supported in searching for datasets and preserving their research data. We are also interested in collecting ideas about future requirements to be implemented in upcoming versions of the LDM
Warum und wie Sie Klimamodelldaten veröffentlichen sollten
Vorhersage und ProjektionBei Klimasimulationen werden groĂe Mengen an Daten erzeugt. Aus diesem Grund können in der Regel nicht alle Ergebnisse einer Simulation eines Klimamodells von einer Forschungsgruppe alleine ausgewertet werden. Beim Coupled Model Intercomparison Projekt (CMIP) wird daher ein Fokus darauf gelegt, dass auch andere Forschungsgruppen die Daten auswerten können. Deshalb gibt es genaue Vorgaben, wie diese Daten zu beschreiben und zu formatieren sind. Zudem werden viele dieser DatensĂ€tze mit einem DOI (Digital Object Identifier) versehen. Dies alles erleichtert die Suche und Weiterverarbeitung der Daten.
Allerdings gibt es weitaus mehr als CMIP Daten, die fĂŒr die Klimaforschung wichtig sind. Viele Ergebnisse von z.B. regionalen Klimamodellen oder Stadtklimamodellen werden nicht veröffentlicht, obwohl von den Datenerzeugern nur ein Bruchteil der Ergebnisse ausgewertet werden kann. Deshalb drĂ€ngen viele Förderer auf eine Veröffentlichung der Daten in einem Repositorium. Aber auch in diesem Fall können sie oft nicht weiterverwendet werden. Die GrĂŒnde sind vielfĂ€ltig:
Unzureichende Metadaten
Mangelnde Auffindbarkeit, z.B. durch Suchmaschinen
Fehlende Rechte zur Weiterverarbeitung
Fehlende QualitÀtskontrolle
Das BMBF geförderte Projekt AtMoDat (https://www.ATMODAT.de) wurde 2019 gestartet, um die Veröffentlichung von AtmosphĂ€rischen Modelldaten zu stĂ€rken und zu verbessern. Eine Methode ist dabei die Einhaltung der FAIR-Prinzipien - Findable, Accessible, Interoperable, Reusable (siehe Wilkinson et al., 2016). Zudem sollten alle Daten mit einem DataCite DOI veröffentlicht werden, um die Auffindbarkeit und Zitierbarkeit zu verbessern. Eine Anleitung, wie man dabei vorgehen sollte, findet sich in dem Standard, der im AtMoDat-Projekt entwickelt wurde. Der ATMODAT-Standard ist leicht umzusetzen und beinhaltet genaue Vorgaben fĂŒr die Metadaten des DOI, die Landing Page und die Header der netCDF-Dateien. Daten, die diesem Standard genĂŒgen und dessen Einhaltung vom jeweiligen Repositorium geprĂŒft wurde, können mit dem Earth System Data Branding (EASYDAB) versehen werden. Durch dieses Branding kann eine angemessene QualitĂ€tssicherung der Daten durch die Nutzer sehr leicht nachvollzogen werden. Im Vortrag werden der Standard und EASYDAB vorgestellt
Recommended from our members
RADAR â Repositorium und Publikations-Service fĂŒr Forschungsdaten
[no abstract available
Recommended from our members
A short guide to increase FAIRness of atmospheric model data
The generation, processing and analysis of atmospheric model data are expensive, as atmospheric model runs are often computationally intensive and the costs of âfastâ disk space are rising. Moreover, atmospheric models are mostly developed by groups of scientists over many years and therefore only few appropriate models exist for specific analyses, e.g. for urban climate. Hence, atmospheric model data should be made available for reuse by scientists, the public sector, companies and other stakeholders. Thereby, this leads to an increasing need for swift, user-friendly adaptation of standards.The FAIR data principles (Findable, Accessible, Interoperable, Reusable) were established to foster the reuse of data. Research data become findable and accessible if they are published in public repositories with general metadata and Persistent Identifiers (PIDs), e.g. DataCite DOIs. The use of PIDs should ensure that describing metadata is persistently available. Nevertheless, PIDs and basic metadata do not guarantee that the data are indeed interoperable and reusable without project-specific knowledge. Additionally, the lack of standardised machine-readable metadata reduces the FAIRness of data. Unfortunately, there are no common standards for non-climate models, e.g. for mesoscale models, available. This paper proposes a concept to improve the FAIRness of archived atmospheric model data. This concept was developed within the AtMoDat project (Atmospheric Model Data). The approach consists of several aspects, each of which is easy to implement: requirements for rich metadata with controlled vocabulary, the landing pages, file formats (netCDF) and the structure within the files. The landing pages are a core element of this concept as they should be human- and machine readable, hold discipline-specific metadata and present metadata on simulation and variable level. This guide is meant to help data producers and curators to prepare data for publication. Furthermore, this guide provides information for the choice of keywords, which supports data reusers in their search for data with search engines. © 2020 The author
Recommended from our members
The ATMODAT Standard enhances FAIRness of Atmospheric Model data
Within the AtMoDat project (Atmospheric Model Data, www.atmodat.de), a standard has been developed which is meant for improving the FAIRness of atmospheric model data published in repositories. Atmospheric model data form the basis to understand and predict natural events, including atmospheric circulation, local air quality patterns, and the planetary energy budget. Such data should be made available for evaluation and reuse by scientists, the public sector, and relevant stakeholders.
Atmospheric modeling is ahead of other fields in many regards towards FAIR (Findable, Accessible, Interoperable, Reusable, see e.g. Wilkinson et al. (2016, doi:10.1101/418376)) data: many models write their output directly into netCDF or file formats that can be converted into netCDF. NetCDF is a non-proprietary, binary, and self-describing format, ensuring interoperability and facilitating reusability. Nevertheless, consistent human- and machine-readable standards for discipline-specific metadata are also necessary. While standardisation of file structure and metadata (e.g. the Climate and Forecast Conventions) is well established for
some subdomains of the earth system modeling community (e.g. the Coupled Model Intercomparison Project, Juckes et al. (2020,
https:doi.org/10.5194/gmd-13-201-2020)), other subdomains are still lacking such standardisation. For example, standardisation is not well advanced for obstacle-resolving atmospheric models (e.g. for urban-scale modeling).
The ATMODAT standard, which will be presented here, includes concrete recommendations related to the maturity, publication, and enhanced FAIRness of atmospheric model data. The suggestions include requirements for rich metadata with controlled vocabularies, structured landing pages, file formats (netCDF), and the structure within files. Human- and machine-readable landing pages are a core element of this standard and should hold and present discipline-specific metadata on simulation and variable level
- âŠ